\subsection{Definitions}
\begin{definition}
	Let \( \Omega \) be a finite set of symbols, and let \( \Omega^\star \) be the set of \( \Omega \)-strings.
	We call elements of \( \Omega^\star \times \Omega^\star \) \emph{rewrite rules} or \emph{production rules}.
	Such elements \( (\alpha, \beta) \) are written \( \alpha \to \beta \).
\end{definition}
Informally, we interpret a rewrite rule \( \alpha \to \beta \) as a procedure that replaces an occurence of \( \alpha \) in a string with \( \beta \).
\begin{definition}
	A pair \( R = (\Omega, P) \) is called a \emph{rewrite system} if \( P \) is a finite set of rewrite rules.
\end{definition}
\begin{proposition}
	If \( \Omega \) is finite, there are only countably many rewrite systems on \( \Omega \).
\end{proposition}
\begin{proof}
	\( \Omega^\star \) is countable, so \( \Omega^\star \times \Omega^\star \) is countable.
	Every \( P \) is an element of \( \mathrm{Fin}(\Omega^\star \times \Omega^\star) \), hence this is countable.
\end{proof}
\begin{definition}
	If \( R = (\Omega, P) \) is a rewrite system, and \( \sigma, \tau \in \Omega^\star \), we write \( \sigma \xrightarrow R_1 \tau \), pronounced `\( \sigma \) is rewritten to \( \tau \) in one step' or `\( R \) produces \( \tau \) from \( \sigma \) in one step', if there exist \( \alpha, \beta, \gamma, \delta \in \Omega^\star \) such that \( \sigma = \alpha \gamma \beta \), \( \tau = \alpha \delta \beta \), and \( \gamma \to \delta \in P \).

	The relation \( \xrightarrow R \) is the reflexive and transitive closure of \( \xrightarrow R_1 \).
	The sequence \( \sigma_0 \xrightarrow R_1 \sigma_1 \xrightarrow R_1 \dots \xrightarrow R_1 \sigma_n \) is called a \emph{\( R \)-derivation of length \( n \)} of \( \sigma_n \) from \( \sigma_0 \).
	We write
	\[ \mathcal D(R, \sigma) = \qty{\tau \in \Omega^\star \mid \sigma \xrightarrow R \tau} \]
	for the set of strings that can be rewritten, produced, or derived from \( \sigma \).
\end{definition}

\subsection{Relation to languages}
In language, we can think of \( \Omega \) as representing letters, and \( \Omega^\star \) representing words.
We could alternatively consider \( \Omega \) to represent words, and \( \Omega^\star \) to represent sentences.
Further, \( \Omega \) could represent sentences, and then \( \Omega^\star \) would represent texts.

However, not all elements of \( \Omega^\star \) in each level is a valid word, sentence, or text.
We therefore would like to describe which elements of \( \Omega^\star \) are \emph{well-formed}.
Natural languages spoken by humans are finite, and normally the way we determine whether a string is a word is by consulting a dictionary, which at its core is a lookup table that determines whether any given string is or is not a word.

Even though in practice languages are finite, Chomsky realised that it makes more sense to model them as infinite sets, due to a property known as \emph{linguistic recursion} that seems to be an important feature of human language.
Linguistic recursion can be seen through the following example: when \( X \) is a sentence in English, `\( E \) observes that \( X \)' is also a grammatical sentence in English.
If we define an upper sentence length in English, we have to arbitrarily define an upper limit on this form of recursion.

There is a difference between a sentence being grammatical and being meaningful.
One notable example is the grammatically correct `colourless green ideas sleep furiously' that does not have meaning, to contrast with `furiously sleep ideas green colourless' which is neither grammatically correct or meaningful.
We can use grammar to distinguish these two sentences, but we cannot distinguish algebraically whether a sentence has meaning.

\begin{example}
	Consider the following \emph{generative grammar} of rewrite rules for English.
	\begin{align*}
		S &\to \text{NP VP} \\
		\text{NP} &\to \text{Adj NP} \\
		\text{NP} &\to \text{Noun} \\
		\text{VP} &\to \text{Verb} \\
		\text{VP} &\to \text{Verb Adv}
	\end{align*}
	This rewrite system allows us to derive the sentence `colourless green ideas sleep furiously' from \( S \).
\end{example}

\subsection{Grammars}
\begin{definition}
	Let \( \Sigma \) be an \emph{alphabet} of \emph{letters} or \emph{terminal symbols}, and let \( V \) be a set of \emph{variables} or \emph{nonterminal symbols}, such that \( \Sigma, V \) are nonempty and disjoint.
	Let \( \Omega = \Sigma \cup V \).
	\( a, b, c, \dots \) refer to letters and \( A, B, C, \dots \) refer to variables.
	Elements of \( \mathbb W = \Sigma^\star \subseteq \Omega^\star \) are called \emph{words}.
	\( u, v, w, \dots \) refer to words.
	We denote \( \mathbb W^+ = \Sigma^\star \setminus \qty{\varepsilon} \) for the set of nonempty words.
	A subset of \( \mathbb W \) is called a \emph{language}.
\end{definition}
Note that there are uncountably many languages over any nonempty alphabet.
\begin{definition}
	A tuple \( G = (\Sigma, V, P, S) \) is called a \emph{grammar} if \( \Sigma, V \) are nonempty and disjoint denoting \( \Omega = \Sigma \cup V \), such that \( R = (\Omega, P) \) is a rewrite system, and \( S \in V \) is the \emph{start symbol}.
	Since grammars give rise to a natural rewrite system, our notation for rewrite systems may also be used for grammars.
	For example,
	\[ \mathcal D(G, \sigma) = \mathcal D(R, \sigma);\quad \sigma \xrightarrow G_{(1)} \tau \iff \sigma \xrightarrow R_{(1)} \tau \]
	We define the \emph{language generated by the grammar} to be
	\[ \mathcal L(G) = \mathcal D(G,S) \cap \mathbb W \]
\end{definition}
\begin{example}
	If there is no rule of the form \( S \to \alpha \) in \( P \), then \( \mathcal D(G,S) = \qty{S} \) and thus \( \mathcal L(G) = \varnothing \) because the start symbol is not a word.
	Likewise, if there is no rule of the form \( \alpha \to w \) for \( w \in \mathbb W \) in \( P \), then \( \mathcal D(G,S) \) contains no words, so \( \mathcal L(G) = \varnothing \).
\end{example}
\begin{example}
	Let \( \Sigma = \qty{a} \), \( V = \qty{S} \), \( P_0 = \qty{S \to aaS, S \to a} \), \( G_0 = (\Sigma, V, P, S) \).
	We will show \( \mathcal L(G_0) = \qty{a^{2n+1} \mid n \in \mathbb N} \).
	First, every element of \( \mathcal D(G,S) \) that is produced by \( G_0 \) is of odd length, which can be seen by induction on the length of the derivation, since each production rule preserves parity of length.
	Conversely, each \( a^{2n+1} \) can be produced by the rewrite rules, by applying \( S \to aaS \) a total of \( n \) times, and then applying \( S \to a \).

	Note that the only requirement of the proof was that odd length is preserved. Thus, the following sets of production rules also produce the same language.
	\begin{itemize}
	\item \( P_1 = \qty{S \to aSa, S \to a} \)
	\item \( P_2 = \qty{S \to Saa, S \to a} \)
	\item \( P_3 = \qty{S \to aaS, S \to aaSaa, S \to a} \)
	\end{itemize}
	This notion is called \emph{equivalence of grammars}.
\end{example}

\subsection{Equivalent grammars}
\begin{definition}
	Grammars \( G, G' \) are \emph{equivalent} if \( \mathcal L(G) = \mathcal L(G') \).
\end{definition}
We intend to show that for a fixed finite set \( \Sigma \), there are only countably many languages of the form \( \mathcal L(G) \) for a grammar \( G \) (which may have arbitrary variable sets \( V \)).
\begin{definition}
	Let \( G = (\Sigma, V, P, S), G' = (\Sigma, V', P', S') \) be grammars on the same alphabet \( \Sigma \).
	A function \( f \colon \Omega \to \Omega' = \Sigma \cup V \to \Sigma \cup V' \) is called an \emph{isomorphism} if
	\begin{enumerate}
		\item \( \eval{f}_\Sigma = \mathrm{id} \);
		\item \( f(S) = S' \);
		\item \( \eval{f}_V \) is a bijection from \( V \) to \( V' \);
		\item \( \alpha \to \beta \in P \iff f(\alpha) \to f(\beta) \in P' \).
	\end{enumerate}
	Note that here, since \( \alpha, \beta \in \Omega^\star \), \( f(\alpha) = \hat f(\alpha) \) is the extension of \( f \) to \( \Omega^\star \).
\end{definition}
\begin{proposition}
	Isomorphic grammars are equivalent.
\end{proposition}
\begin{proof}
	If \( f \) is an isomorphism from \( G \) to \( G' \), \( f^{-1} \) is an isomorphism from \( G' \) to \( G \).
	Thus, by antisymmetry of \( \subseteq \), it suffices to show that \( \mathcal L(G) \subseteq \mathcal L(G') \).
	Let \( w \in \mathcal L(G) \).
	Then there is a derivation in \( G \) of \( w \) from \( S \):
	\[ S = \sigma_0 \xrightarrow G_1 \sigma_1 \xrightarrow G_1 \dots \xrightarrow G_1 \sigma_n = w \]
	Applying \( f \) to each element of this sequence,
	\[ S' = f(\sigma_0) \xrightarrow {G'}_1 f(\sigma_1) \xrightarrow {G'}_1 \dots \xrightarrow {G'}_1 f(\sigma_n) = w \]
	The start and end symbols take these values due to property (i) and (ii).
	Each arrow holds by property (iv).
	This is a derivation of \( w \) from \( S' \) in \( G \).
	Hence \( w \in \mathcal L(G') \).
\end{proof}
\begin{proposition}
	If \( G = (\Sigma, V, P, S) \) and \( V' \) is such that \( \abs{V} = \abs{V'} \), then there exist \( P', S' \) such that \( \mathcal L(G) = \mathcal L(G') \) with \( G' = (\Sigma, V', P', S') \).
\end{proposition}
\begin{proof}
	Since \( \abs{V} = \abs{V'} \), there exists a bijection \( f \colon V \to V' \).
	Then, extending this to \( \Omega = \Sigma \cup V \) by letting \( f(a) = a \) for all \( a \in \Sigma \), this satisfies properties (i) and (iii) of the definition of an isomorphism.
	Define \( S' = f(S) \) and \( P' = \qty{f(\alpha) \to f(\beta) \mid \alpha \to \beta \in P} \), so that properties (ii) and (iv) are satisfied.
	Then \( (\Sigma, V, P, S) \) is isomorphic to \( (\Sigma, V', P', S') \) and thus they have the same language.
\end{proof}
\begin{proposition}
	There are only countably many languages of the form \( \mathcal L(G) \) for some grammar \( G \) on a fixed alphabet \( \Sigma \).
\end{proposition}
\begin{proof}
	Let \( \mathcal L \) be the set of all such languages.
	For a fixed \( V \), there are only countably many rewrite systems with this choice of \( \Sigma \) and \( V \).
	Hence, the set \( \mathcal G_V \) of all grammars with fixed \( V \) is a finite union (over all start symbols) of countable sets.
	Therefore \( \mathcal L_V = \qty{\mathcal L(G) \mid G \in \mathcal G_V} \) is also countable.

	By the previous result, we can define \( \mathcal L_n = \mathcal L_V \) for some \( n \)-element set \( V \).
	Now, \( \mathcal L = \bigcup_{n > 0} \mathcal L_n \), which is a countable union of countable sets and is thus countable.
\end{proof}
\begin{remark}
	The set of languages produced by grammars is countable, but the set of all languages \( \mathcal P(\mathbb W) \) is uncountable.
\end{remark}

\subsection{The Chomsky hierarchy}
Production rules may have certain properties.
\begin{definition}
	Let \( \alpha \to \beta \) be a production rule.
	We call this rule:
	\begin{enumerate}
		\item \emph{noncontracting}, if \( \abs{\alpha} \leq \abs{\beta} \);
		\item \emph{context-sensitive}, if \( \exists A \in V,\, \exists \gamma,\delta \in \Omega^\star,\, \exists \eta \in \Omega^+,\, \alpha = \gamma A \delta, \beta = \gamma \eta \delta \);
		\item \emph{context-free}, if \( \alpha = A \in V \) and \( \abs{\beta} > 0 \);
		\item \emph{regular}, if \( \alpha = A \in V \) and \( \beta \) is either \( a \in \Sigma \) or \( aB \in \Sigma V \).
	\end{enumerate}
	Regular implies context-free, context-free implies context-sensitive, context-sensitive implies noncontracting.
	Let \( \mathbb Q \) be any of the above four properties.
	We say that a grammar is \( \mathbb Q \) if all its production rules are \( \mathbb Q \).
	A language is \( \mathbb Q \) if it admits a grammar which is \( \mathbb Q \).
\end{definition}
\begin{theorem}[Chomsky]
	A language is noncontracting if and only if it is context-sensitive.
\end{theorem}
Chomsky used the following notation: a language \( \mathcal L \) is
\begin{itemize}
	\item \emph{type 0}, if it is of the form \( \mathcal L(G) \) for some \( G \);
	\item \emph{type 1}, if it is of the form \( \mathcal L(G) \) for some \( G \) context-sensitive;
	\item \emph{type 2}, if it is of the form \( \mathcal L(G) \) for some \( G \) context-free;
	\item \emph{type 3}, if it is of the form \( \mathcal L(G) \) for some \( G \) regular.
\end{itemize}
We can easily find production rules that are context-sensitive but not context-free, for example.
However, it is less obvious to show that there is a \emph{language} that can be defined using a context-sensitive grammar but no context-free grammar.
One thing motivating our work will be the development of techniques to distinguish the different classes of languages in the Chomsky hierarchy.

\subsection{Decision problems}
We present three important decision problems.
\begin{enumerate}
	\item Consider the \emph{word problem}.
		The input to this problem is a grammar and a word; the question is to determine whether the word lies in the language generated by the grammar.
	\item The \emph{emptiness problem} considers a grammar \( G \).
		The question is whether \( \mathcal L(G) = \varnothing \).
	\item The \emph{equivalence problem} asks whether two grammars \( G, G' \) are equivalent.
\end{enumerate}
\begin{definition}
	We call a problem \emph{solvable} if there is an algorithm that gives the correct answer.
	Otherwise, we call such a problem \emph{unsolvable}.
\end{definition}
Posed for all grammars, all three problems above are unsolvable.
However, when restricted to certain classes of the Chomsky hierarchy, the problems are more approachable.
\begin{lemma}
	If \( G \) is a noncontracting grammar and \( w \in \mathbb W \), there exists a bound \( N \in \mathbb N \) depending only on \( \abs{w} \) and \( \abs{\Omega} \) such that \( w \in \mathcal L(G) \) if and only if \( w \) has a \( G \)-derivation of length at most \( N \).
\end{lemma}
\begin{proof}
	Consider a \( G \)-derivation \( S = \sigma_0, \dots, \sigma_n = w \) of \( w \), and consider the length of each element of the sequence.
	As the grammar is noncontracting, the sequence \( 1 = \abs{\sigma_0}, \dots, \abs{\sigma_n} = \abs{w} \) is nondecreasing.
	Consider a part of the derivation \( \sigma_i, \dots, \sigma_{i+k} \) for which the length of the \( \abs{\sigma_i} \) does not change, so \( \abs{\sigma_i} = \abs{\sigma_{i+k}} \).
	If \( \sigma_r = \sigma_s \) for some \( r \neq s \in \qty{i, \dots, i+k} \), we can shrink the derivation to \( \sigma_i, \dots, \sigma_r, \sigma_{s+1}, \dots, \sigma_{i+k} \).

	Therefore, without loss of generality, we can assume \( \sigma_0, \dots, \sigma_n \) is a derivation of minimal length, so all \( \sigma_i \) are distinct.
	Then by the pigeonhole principle,
	\[ n \leq \sum_{\ell=1}^{\abs{w}} \abs{\Omega}^\ell = N \]
\end{proof}
\begin{corollary}
	The word problem is solvable on noncontracting, context-sensitive, context-free, and regular grammars.
\end{corollary}
\begin{proof}
	Let \( w \in \mathbb W \).
	There is a finite, enumerable collection of possible derivations for \( w \) by the above lemma.
	Check each derivation manually.
\end{proof}

\subsection{Closure problems}
Closure problems are concerned with operations on languages to produce new languages.
Let \( L, M \) be languages.
Commonly used operations include \( LM, L \cup M, L \cap M, L \setminus M, \mathbb W^+ \setminus L \).
Note that we use \( \mathbb W^+ \setminus L \) instead of \( L^c \) because noncontracting languages cannot contain the empty word.
If \( \mathcal C \) is a class of languages, such as the class of all regular languages, we say that \( \mathcal C \) is \emph{closed} under an operation if applying that operation to elements of \( \mathcal C \) yields a result which also lies in \( \mathcal C \).
We would like to see which classes are closed under which operations.
Note that some closure properties imply others; for instance, closure under complement and intersection implies closure under union by De Morgan's laws.

\begin{definition}
	Let \( G = (\Sigma, V, P, S), G' = (\Sigma, V', P', S') \) be grammars.
	Then \( H = (\Sigma, V \cup V' \cup \qty{T}, P^\star, T) \) is called the \emph{concatenation grammar}, where \( T \) is a new variable, and
	\[ P^\star = P \cup P' \cup \qty{T \to SS'} \]
	\( H' = (\Sigma, V \cup V' \cup \qty{T}, P^{\star\star}, T) \) is called the \emph{union grammar}, where \( T \) is a new variable, and
	\[ P^{\star\star} = P \cup P' \cup \qty{T \to S, T \to S'} \]
\end{definition}
\begin{remark}
	\( \mathcal L(G)\mathcal L(G') \subseteq \mathcal L(H) \) by construction, and \( \mathcal L(G) \cup \mathcal L(G') \subseteq \mathcal L(H') \), but it is not true \emph{a priori} that the converse holds, because \( P \) and \( P' \) could share some variables.
	We can assume that \( V, V' \) are disjoint by relabelling, but that is insufficient for the converses to hold, because there may be interaction on the level of letters in \( \Sigma \), which cannot be relabelled.
	The concatenation grammar on context-free grammars is context-free, and the union grammar on regular languages is regular.
\end{remark}
\begin{definition}
	A production rule \( \alpha \to \beta \) is \emph{variable-based} if all symbols occuring in \( \alpha \) are variables.
	A grammar is called variable-based if all its rules are variable-based.
\end{definition}
\begin{remark}
	Regular and context-free languages are variable-based.
	Context-sensitive languages are not all variable-based.
\end{remark}
\begin{lemma}
	Every grammar is equivalent to a variable-based grammar.
\end{lemma}
\begin{proof}
	Let \( G = (\Sigma, V, P, S) \).
	For each letter \( a \in \Sigma \), we allocate a new variable \( X_a \).
	We define the map \( X \colon \Omega \to \Omega \) by \( X(a) = X_a \) for \( a \in \Sigma \), and \( X(A) = A \) for \( A \in V \).
	Then \( X \) extends in the natural way to a map \( X \colon \Omega^\star \to \Omega^\star \).
	We can map each production rule in \( G \) to a version that uses only variables and no letters by applying \( X \) to both sides.
	Hence, we define \( P' = \qty{X(\alpha) \to X(\beta) \mid \alpha \to \beta \in P} \).
	Then, defining \( P'' = \qty{X_a \to a \mid a \in \Sigma} \), let \( G' = (\Sigma, V \cup \qty{X_a \mid a \in \Sigma}, P' \cup P'', S) \).
	This grammar is variable-based and so it suffices to show that it defines the same language as \( G \).

	Any \( G \)-derivation of \( w \) is transformed into a \( G' \)-derivation of \( X(w) \) by the operation \( \alpha \to X(\alpha) \).
	Similarly, if we have a \( G' \)-derivation that contains no letters anywhere, all strings occuring are of the form \( X(\alpha) \) for some \( \alpha \in \Omega^\star \), and the operation of replacing all occurences of \( X_a \) with \( a \) transforms that derivation into a \( G \)-derivation.
	Thus, \( w \in \mathcal L(G) \) if and only if \( X(w) \in \mathcal D(G', S) \).

	If \( X(w) \in \mathcal D(G', S) \) then, by applying rules of the form \( X_a \to a \) as needed, we have \( w \in \mathcal L(G') \).

	Conversely, suppose \( w \in \mathcal L(G') \) and let \( S = \sigma_0, \dots, \sigma_m = w \) be a \( G' \)-derivation of \( w \).
	Applying the operation \( X \) to this derivation, we obtain a sequence \( S = \tau_0, \dots, \tau_m \).
	This sequence is not necessarily a \( G' \)-derivation.
	If \( \sigma_i \xrightarrow {G'}_1 \sigma_{i+1} \) was an application of a rule of the form \( X(\alpha) \to X(\beta) \), then the same rule gives \( X(\sigma_i) \xrightarrow {G'}_1 X(\sigma_{i+1}) \).
	In the other case, \( \sigma_i \xrightarrow {G'}_1 \sigma_{i+1} \) was an application of a rule of the form \( X_a \to a \), so applying \( X \) gives \( X(\sigma_i) = X(\sigma_{i+1}) \).
	Since for each letter \( a \) there is only one production rule that produces \( a \), we know that \( \abs{w} \)-many steps of the derivation must be of this form.
	Thus, removing these steps will make the remainder of the sequence \( \tau_0, \dots, \tau_m \) a \( G' \)-derivation of length \( m - \abs{w} \) of \( X(w) \).
	Then \( w \in \mathcal L(G) \) as required.
\end{proof}
\begin{remark}
	The classes of context-free, context-sensitive, and noncontracting grammars are stable under the action of turning a grammar into its equivalent variable-based grammar; all added rules are regular.
	Regularity is not necessarily preserved.
\end{remark}
\begin{theorem}
	Let \( G = (\Sigma, V, P, S), G' = (\Sigma, V', P', S') \) be variable-based grammars with \( V \cap V' = \varnothing \).
	Then \( \mathcal L(H) = \mathcal L(G)\mathcal L(G') \), and \( \mathcal L(H') = \mathcal L(G) \cup \mathcal L(G') \).
	The classes of regular, context-free, context-sensitive, and noncontracting languages are closed under union.
	The classes of context-free, context-sensitive, and noncontracting languages are closed under concatenation.
\end{theorem}
\begin{proof}
	First, convert \( G, G' \) into variable-based grammars with disjoint variable sets.
	Then, the concatenation grammar and union grammar for \( G, G' \) must produce the required language by disjointness.
\end{proof}

\subsection{The empty word}
\begin{remark}
	By induction, we can easily show that noncontracting grammars cannot produce the empty word, so we usually work with \( \mathbb W^+ \) in place of \( \mathbb W \).
	In general, adding the rule \( S \to \varepsilon \) to a grammar in order to allow the empty word may introduce side-effects due to reuse of \( S \).
\end{remark}
\begin{definition}
	A rule \( \alpha \to \beta \) is \emph{\( S \)-safe} if it does not contain \( S \) in \( \beta \).
	A grammar is \emph{\( \varepsilon \)-adequate} if all rules are \( S \)-safe.
\end{definition}
An \( \varepsilon \)-adequate grammar admits the addition of the rule \( S \to \varepsilon \) converting the language \( \mathcal L(G) \) into \( \mathcal L(G) \cup \qty{\varepsilon} \).
We can easily convert a grammar into an equivalent \( \varepsilon \)-adequate grammar by mapping \( G = (\Sigma, V, P, S) \) to \( G' = (\Sigma, V \cup \qty{T}, P \cup \qty{T \to S}, T) \) where \( T \) is a new variable.
